MixedGrad: An O(1/T) Convergence Rate Algorithm for Stochastic Smooth Optimization
نویسندگان
چکیده
It is well known that the optimal convergence rate for stochastic optimization of smooth functions is O(1/ √ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T ). In this work, we consider a new setup for optimizing smooth functions, termed as Mixed Optimization, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to significantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an O(ln T ) calls to the full gradient oracle and an O(T ) calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of O(1/T ).
منابع مشابه
Mixed Optimization for Smooth Functions
It is well known that the optimal convergence rate for stochastic optimization of smooth functions is O(1/ √ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T ). In this work, we consider a new setup for optimizing smooth functions, termed asMix...
متن کاملMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This mig...
متن کاملZeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction
Zeroth-order (derivative-free) optimization attracts a lot of attention in machine learning, because explicit gradient calculations may be computationally expensive or infeasible. To handle large scale problems both in volume and dimension, recently asynchronous doubly stochastic zeroth-order algorithms were proposed. The convergence rate of existing asynchronous doubly stochastic zeroth order ...
متن کاملO(logT) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions
Traditional algorithms for stochastic optimization require projecting the solution at each iteration into a given domain to ensure its feasibility. When facing complex domains, such as the positive semidefinite cone, the projection operation can be expensive, leading to a high computational cost per iteration. In this paper, we present a novel algorithm that aims to reduce the number of project...
متن کاملFast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization
We consider the stochastic composition optimization problem proposed in [17], which has applications ranging from estimation to statistical and machine learning. We propose the first ADMM-based algorithm named com-SVRADMM, and show that com-SVR-ADMM converges linearly for strongly convex and Lipschitz smooth objectives, and has a convergence rate of O(logS/S), which improves upon the O(S−4/9) r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1307.7192 شماره
صفحات -
تاریخ انتشار 2013